Multi-centroidal duration generation algorithm for HMM-based TTS
نویسندگان
چکیده
A novel method is proposed to improve the duration prediction for HMM based speech synthesis. Based on the decision tree trained by the conventional HTS training method, the duration instances of every leaf node are further clustered into several classes by the K-means clustering method, and the mapping functions between the context features and class labels are trained by CRF. Instead of using the mean value of the Gaussian distribution of a leaf node in the decision tree as the predicted duration, the weighted summation of the multicentroids from these several clustered classes is used to predict the phoneme duration. The weights are given by the output probability provided by CRF according to input context features and the prior probability from the clustering results. Compared with conventional HTS method, experiments show that the proposed method can significantly reduce RMSE in objective evaluations and achieves better preference scores in the subjective evaluations.
منابع مشابه
Improved generation of prosodic features in HMM-based Mandarin speech synthesis
The HMM-based Text-to-Speech System can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. However, the prosodic features, like F0 and duration trajectories, generated by HMM-based speech synthesis are often excessively smoothed and lack prosodic variance. In HMM-based TTS durations are typically modeled statistically using state duration probabili...
متن کاملBased Speech Synthesis
This paper describes a approach to text-to-speech synthesis (TTS) based on HMM. In the proposing approach, speech spectral parameter sequences are generated from HMMs directly based on maximum likelihood criterion. By considering relationship between static and dynamic features during parameter generation, smooth spectral sequences are generated according to the statistics of static and dynamic...
متن کاملPerformance Analysis of Text To Speech Synthesis System Using HMM And Prosody Features With Parsing For Tamil Language
This paper describes a Hidden Markov Model (HMM) based (TTS) system and prosody based (TTS) system for producing natural sounding synthetic speech in Tamil language. The (HMM) based system consists of two phases such as training and synthesis. Tamil speech is first parameterized into spectral and excitation features using Glottal Inverse Filtering (GIF). An emotions present in the input text is...
متن کاملSimultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
In this paper, we describe an HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM. In the system, pitch and state duration are modeled by multi-space probability distribution HMMs and multi-dimensional Gaussian distributions, respectively. The distributions for spectral parameter, pitch parameter and the state du...
متن کاملA hybrid TTS between unit selection and HMM-based TTS under limited data conditions
The intelligibility of HMM-based TTS can reach that of the original speech. However, HMM-based TTS is far from natural. On the contrary, unit selection TTS is the most-natural sounding TTS currently. However, its intelligibility and naturalness on segmental duration and timing are not stable. Additionally, unit selection needs to store a huge amount of data for concatenation. Recently, hybrid a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013